Skip to main content

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: maxwellc Date: Aug 14, 2013 6:43am
Forum: web Subject: Re: Why can't I access old crawled sites who have recent robot.txt exclusions?

I understand the reasoning, but I am hoping there is a solution.

The reasoning is that if I am a site owner, and I suddenly discover that my content has been leaking all this time and I need to plug the leak, it is easy to just put a robots.txt rather than tracking down all the archivers and asking each of them to block. Then, for's perspective, if they have to manually process each request, and further have to validate that the person contacting them actually owns the domain and has a right to request blocking, that is alot of work for a non-profit to be taking on, and distracts from the core work.

Personally, I feel that with the level of parking going on, it is more likely that the current owner is not the owner of the content, and does not have the right to speak for the content that used to be at that address than that it is a leak that is finally being plugged, especially from 10 years ago.

The FAQ however, suggests there is also a technical challenge in that it is a binary flag for simplicity rather than a record of when something was available or not. The FAQ suggests some hope by the use of the term "Currently" in the line
"Currently there is no way to exclude only a portion of a site, or to exclude archiving a site for a particular time period only. " (From
It is unclear whether this suggests there is work going on to look into this possibility, or just a statement of fact, without precluding that this might change in the future.